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Abstract 

Currently, it appears that the best method for non-Gaussianity detection in the Cosmic 
Microwave Background (CMB) consists in calculating the kurtosis of the wavelet coefficients. 
We know that wavelet-kurtosis outperforms other methods such as the bispectrum, the genus, 
ridgelet-kurtosis and curvelet-kurtosis on an empirical basis, but relatively few studies have 
compared other transform-based statistics, such as extreme values, or more recent tools such 
as Higher Criticism (HC), or proposed ‘best possible’ choices for such statistics. 

In this paper we consider two models for transform-domain coefficients: (a) a power- 
law model, which seems suited to the wavelet coefficients of simulated cosmic strings; and 
(b) a sparse mixture model, which seems suitable for the curvelet coefficients of filamentary 
structure. For model (a), if power-law behavior holds with finite 8-th moment, excess kurtosis 
is an asymptotically optimal detector, but if the 8-th moment is not finite, a test based on 
extreme values is asymptotically optimal. For model (b), if the transform coefficients are 
very sparse, a recent test. Higher Criticism, is an optimal detector, but if they are dense, 
kurtosis is an optimal detector. Empirical wavelet coefficients of simulated cosmic strings 
have power-law character, infinite 8-th moment, while curvelet coefficients of the simulated 
cosmic strings are not very sparse. In all cases, excess kurtosis seems to be an effective test 
in moderate-resolution imagery. 
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1 Introduction 


The Cosmic Microwave Background (CMB), discovered in 1965 by Penzias and Wilson is a 
relic of radiation emitted some 13 billion years ago, when the Universe was abont 370.000 years 
old. This radiation exhibits characteristic of an almost perfect blackbody at a temperature of 
2.726 Kelvin as measured by the FIRAS experiment on board COBE satellite [2111. DMR 
experiment, again on board COBE, detected and measured angular small fluctuations of this 
temperature, at the level of a few tens of micro Kelvin, and at angular scale of about 10 degrees 
M- These so-called temperature anisotropies were predicted as the imprints of the initial 
density perturbations which gave rise to present large scale strnctures as galaxies and clusters 
of galaxies. This relation between the present-day universe and its initial conditions has made 
the CMB radiation one of the preferred tools of cosmologists to understand the history of the 
universe, the formation and evolution of the cosmic structures and physical processes responsible 
for them and for their clustering. 

As a consequence, the last several years have been a particularly exciting period for obser¬ 
vational cosmology focussing on the CMB. With CMB balloon-borne and ground-based exper¬ 
iments such as TOCO {SHI; BOOMERanG [10], MAXIMA |24] . DASI [23| and Archeops |H], 
a firm detection of the so-called “first peak” in the CMB anisotropy angular power spectrum 
at the degree scale was obtained. This detection was very recently confirmed by the WMAP 
satellite |7], which detected also the second and third peaks. WMAP satellite mapped the CMB 
temperature fluctuations with a resolution better that 15 arc-minutes and a very good accu¬ 
racy marking the starting point of a new era of precision cosmology that enables us to use the 
CMB anisotropy measurements to constrain the cosmological parameters and the underlying 
theoretical models. 

Eigure 1: Courtesy of the WMAP team (reference to the website). All sky map of the CMB 
anisotropies measured by the WMAP satellite. 

In the framework of adiabatic cold dark matter models, the position, amplitude and width of 
the first peak indeed provide strong evidence for the inflationary predictions of a flat universe and 
a scale-invariant primordial spectrum for the density perturbations. Eurthermore, the presence of 
second and third peaks, confirm the theoretical prediction of acoustic oscillations in the primeval 
plasma and shed new light on various cosmological and inflationary parameters, in particular, 
the baryonic content of the universe. The accurate measurements of both the temperature 
anisotropies and polarised emission of the CMB will enable us in the very near future to break 
some of the degeneracies that are still affecting parameter estimation. It will also allow us to 
probe more directly the inflationary paradigm favored by the present observations. 

Testing the inflationary paradigm can also be achieved through detailed study of the statisti¬ 
cal nature of the CMB anisotropy distribution. In the simplest inflation models, the distribution 
of CMB temperature fluctuations should be Gaussian, and this Gaussian field is completely de¬ 
termined by its power spectrum. However, many models such as multi-field inflation (e.g. [21 
and references therein), super strings or topological defects, predict non-Gaussian contributions 
to the initial fluctuations ISS1I2SI111 The statistical properties of the CMB should discriminate 
models of the early universe. Nevertheless, secondary effects like the inverse Compton scattering, 
the Doppler effect, lensing and others add their own contributions to the total non-Gaussianity. 

All these sonrces of non-Ganssian signatures might have different origins and thus different 
statistical and morphological characteristics. It is therefore not surprising that a large number 
of studies have recently been devoted to the subject of the detection of non-Gaussian signatures. 
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Many approaches have been investigated: Minkowski functionals and the morphological statistics 
[smii], the bispectrum (3-point estimator in the Fourier domain) |1111491 HUj . the trispectrum 
(4-point estimator in the Fourier domain) [SJ, wavelet transforms [Tl 121 L IHl II hi I2h[ 144] . and 
the curvelet transform M- Different wavelet methods have been studied, such as the isotropic 
a trous algorithm uni and the bi-orthogonal wavelet transform inn. (The bi-orthogonal wavelet 
transform was found to be the most sensitive to non-Gaussianity mi). In 0 m] ) it was shown 
that the wavelet transform was a very powerful tool to detect the non-Gaussian signatures. 
Indeed, the excess kurtosis (4th moment) of the wavelet coefficients outperformed all the other 
methods (when the signal is characterised by a non-zero 4th moment). 

Nevertheless, a major issue of the non-Gaussian studies in CMB remains our ability to 
disentangle all the sources of non-Gaussianity from one another. Recent progress has been 
made on the discrimination between different possible origins of non-Gaussianity. Namely, it 
was possible to separate the non-Gaussian signatures associated with topological defects (cosmic 
strings (GS)) from those due to Doppler effect of moving clusters of galaxies (both dominated 
by a Gaussian CMB field) by combining the excess kurtosis derived from both the wavelet and 
the curvelet transforms mi- 

This success argues for us to construct a “toolkit” of well-understood and sensitive methods 
for probing different aspects of the non-Gaussian signatures. 

In that spirit, the goal of the present study is to consider the advantages and limitations of 
detectors which apply kurtosis to transform coefficients of image data. We will study plausible 
models for transform coefficients of image data and compare the performance of tests based on 
kurtosis of transform coefficients to other types of statistical diagnostics. 

At the center of our analysis are two facts 

[A] The wavelet/curvelet coefficients of CMB are Gaussian (we implicitly assume the most 
simple inflationary scenario). 

[B] The wavelet/curvelet coefficients of topological defect and Doppler effect simulations are 
non-Gaussian. 

We develop tests for non-Gaussianity for two models of statistical behavior of transform 
coefficients. The hrst, better suited for wavelet analysis, models transform coefficients of cosmic 
strings as following a power law. The second, theoretically better suited for curvelet coefficients, 
assumes that the salient features of interest are actually filamentary (it can be residual strips 
due do a non perfect calibration), which gives the curvelet coefficients a sparse structure. 

We review some basic ideas from detection theory, such as likelihood ratio detectors, and 
explain why we prefer non-parametric detectors, valid across a broad range of assumptions. 

In the power-law setting, we consider two kinds of non-parametric detectors. The first, based 
on kurtosis, is asymptotically optimal in the class of weakly dependent symmetric non-Gaussian 
contamination with finite 8-th moments. The second, the Max, is shown to be asymptotically 
optimal in the class of weakly dependent symmetric non-Gaussian contamination with infinite 
8-th moment. While the evidence seems to be that wavelet coefficients of CS have about 6 
existing moments - indicating a decisive advantage for extreme-value statistics - the performance 
of kurtosis-based tests and Max-based tests on moderate sample sizes (eg. 64K transform 
coefficients) does not follow the asymptotic theory; excess kurtosis works better at these sample 
sizes. 

In the sparse-coefficients setting, we consider kurtosis, the Max, and a recent statistic called 
Higher Criticism (HC) |19j . Theoretical analysis suggests that curvelet coefficients of filamentary 
features should be sparse, with about substantial nonzero coefficients out of n coefficients 
in a subband; this level of sparsity would argue in favor of Max/HG. However, empirically, the 
curvelet coefficients of actual CS simulations are not very sparse. It turns out that kurtosis 
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Figure 2: Detectable regions in the a — r plane. With (a, r) in the white region on the top or the 
undetectable region, all methods completely fail for detection. With (a, r) in the white region 
on the bottom, both excess kurtosis and Max/HC are able to detect reliably. While in the blue 
region to the left, Max/HC is able to detect reliably, but excess kurtosis completely fails, and in 
the yellow region to the right, excess kurtosis is able to detect reliably, but Max/HC completely 
fail. 

outperforms Max/HC in simulation. 

Summarizing, the work reported here seems to show that for all transforms considered, the 
excess kurtosis outperforms alternative methods despite their strong theoretical motivation. A 
reanalysis of the theory supporting those methods shows that the case for kurtosis can also be 
justified theoretically based on observed statistical properties of the transform coefficients not 
used in the original theoretic analysis. 

2 Detecting Faint Non-Gaussian Signals Superposed on a Gaus¬ 
sian Signal 

The superposition of a non-Gaussian signal with a Gaussian signal can be modeled asY = N+G, 
where Y is the observed image, N is the non-Gaussian component and G is the Gaussian 
component. We are interested in using transform coefficients to test whether = 0 or not. 

2.1 Hypothesis Testing and Likelihood Ratio Test (LRT). 

Transform coefficients of various kinds [Fourier, wavelet, etc.] have been used for detecting non- 
Gaussian behavior in numerous studies. Let Xi,X 2 , ■ ■ ■, Xn be the transform coefficients of Y ; 
we model these as 

Xi = \/l — A • Zi + \/A • Wi, 0 < A < 1, (2-1) 

where A > 0 is a parameter, Zi N{0, 1) are the transform coefficients of the Gaussian com¬ 
ponent G, Wi W are the transform coefficients of the non-Gaussian component N, and W 
is some unknown symmetrical distribution. Here without loss of generality, we assume the 
standard deviation for both Zi and Wi are 1. 

Phrased in statistical terms, the problem of detecting the existence of a non-Gaussian com¬ 
ponent is equivalent to discriminating between the hypotheses: 

Ho: Xi = Zi, (2.2) 

Hi : Xi = \/l — A • Zi + \/A • Wi, 0 < A < 1, (2-3) 

and = 0 is equivalent to A = 0. We call Hq the null hypothesis Hq, and Hi the alternative 
hypothesis. 

When both W and A are known, then the optimal test for Problem Tn\i - 1131) is simply 
the Neyman-Pearson Likelihood ratio test (LRT), |.12[ Page 74 ]. The size of A = A^ for which 
reliable discrimination between Hq and Hi is possible can be derived using asymptotics. If we 
assume that the tail probability of W decays algebraically, 

lim x“P{|VF| > x} = Ga, Ga is a constant, (2-4) 

X^OQ 
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(we say W has a power-law tail), and we calibrate A to decay with n, so that increasing amounts 
of data are offset by increasingly hard challenges: 


X = \n = n 


then there is a threshold effect for the detection problem ^ - (EH). In fact, define: 


p\{a) 


2/a, a < 8, 
1/4, a > 8, 


(2.5) 


( 2 . 6 ) 


then as n ^ oo, LRT is able to reliably detect for large n when r < Pi{a), and is unable to 
detect when r > p\{a)-, this is proved in EH- Since LRT is optimal, it is not possible for any 
statistic to reliably detect when r > p\{a). We call the curve r = p\{a) in the a-r plane the 
detection boundary, see Figured 

In fact, when r < 1/4, asymptotically LRT is able to reliably detect whenever W has a finite 
8-th moment, even without the assumption that W has a power-law tail. Of course, the case 
that W has an infinite 8-th moment is more complicated, but if W has a power-law tail, then 
LRT is also able to reliably detect if r < 2/a. 

Despite its optimality, LRT is not a practical procedure. To apply LRT, one needs to specify 
the value of A and the distribution of W, which seems unlikely to be available. We need non- 
parametric detectors, which can be implemented without any knowledge of A or W, and depend 
on Xi’s only. In the section below, we are going to introduce two non-parametric detectors: 
excess kurtosis and Max; later in Section SH we will introduce a third non-parametric detector: 
Higher Criticism (HC). 


2.2 Excess Kurtosis and Max 

We pause to review the concept of p-value briefly. For a statistic T„, the p-value is the probability 
of seeing equally extreme results under the null hypothesis: 


P = PHo{Tn > tniXl,X2, . . . , Xn)}', 


here Phq refers to probability under Hq, and tniXi,X 2 , ■ ■ ■, Xn) is the observed value of statistic 
Tn- Notice that the smaller the p-value, the stronger the evidence against the null hypothesis. 
A natural decision rule based on p-values rejects the null when p < a for some selected level a, 
and a convenient choice is a = 5%. When the null hypothesis is indeed true, the p-values for any 
statistic are distributed as uniform C/(0,1). This implies that the p-values provide a common 
scale for comparing different statistics. 

We now introduce two statistics for comparison. 

Excess Kurtosis (/tn)- Excess kurtosis is a widely used statistic, based on the 4-th moment. 
For any (symmetrical) random variable X, the kurtosis is: 

EX^ 

= (EX2)2 " 


The kurtosis measures a kind of departure of X from Gaussianity, as k{z) = 0. 
Empirically, given n realizations of X, the excess kurtosis statistic is defined as: 


(^1 ) ) • • • ) ) 


[TT 

■ 1 ^4 

n o 

V 24 

UiJiixrf 1 


(2.7) 
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When the null is true, the excess kurtosis statistic is asymptotically normal: 

^n(^l) ^2 :■■■■> -^n) -^(0) 1)) ^ ^ CX3) 

thus for large n, the p-value of the excess kurtosis is approximately: 

p = ^-\Kn{Xi,X2,...,Xn)), 

where $(•) is the survival function (upper tail probability) of A^(0,1). 

It is proved in m that the excess kurtosis is asymptotically optimal for the hypothesis 
testing of (tO) - (ESI) if 

E[W^] < oo. 

However, when = oo, even though kurtosis is well-defined < oo), there are 

situations in which LRT is able to reliably detect but excess kurtosis completely fails. In fact, 
by assuming (EH - (ESI) with an a < 8, if (a, r) falls into the blue region of Figure [21 then 
LRT is able to reliably detect, however, excess kurtosis completely fails. This shows that in such 
cases, excess kurtosis is not optimal; see CHI- 

Max (M„). The largest (absolute) observation is a classical and frequently-used non- 
parametric statistic: 

M„ = max(|Xi|,|X 2 |,...,|X„|), 

under the null hypothesis, 

XIn ~ a/ 2 log n, 

and moreover, by normalizing Mn with constants c„ and dn, the resulting statistic converges to 
the Gumbel distribution E^, whose cdf is 


M-n Cn 

dn 


En 


where approximately 

= Cn = X - 0.5772dn; 

TT 

here X and Sn are the sample mean and sample standard deviation of respectively. 

Thus a good approximation of the p-value for M„ is: 


p = exp(—exp( 


- 


dn 


))• 


We have tried the above experiment for n = 244^, and found that taking Cn = 4.2627, dn = 
0.2125 gives a good approximation. 

Assuming (E31) - (1^ and a < 8, or A = n ^ and that W has a power-law tail with a < 8, 
it is proved in CHI that Max is optimal for hypothesis testing (EIS) - ESI)- Recall if we further 
assume \ < f then asymptotically, excess kurtosis completely fails; however. Max is able 
to reliably detect and is competitive to LRT. 

On the other hand, recall that excess kurtosis is optimal for the case a > 8. In comparison, 
in this case. Max is not optimal. In fact, if we further assume ^ ^ then excess kurtosis 

is able to reliably detect, but Max will completely fail. 

In Figure |21 we compared the detectable regions of the excess kurtosis and Max in the a-r 
plane. 
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Figure 3: Primary Cosmic Microwave Background anisotropies (left) and simulated cosmic string 
map (right). 


To conclude this section, we mention an alternative way to approximate the p-values for 
any statistic T^. This alternative way is important in case that an asymptotic (theoretic) 
approximation is poor for moderate large n, an example is the statistic HC* we will introduce in 
Section ESI this alternative way is helpful even when the asymptotic approximation is accurate. 
Now the idea is, under the null hypothesis, we simulate a large number {N = 10^ or more) of 
Tn- Tn^\Ti^\ ...,we then tabulate them. For the observed value tn{Xi,X 2 , ..., Xn), the 
p-value will then be well approximated by: 

>tn{Xi,X2,...,Xn)}, 

and the larger the N, the better the approximation. 

2.3 Heuristic Approach 

We have exhibited a phase-change phenomenon, where the asymptotically optimal test changes 
depending on power-law index a. In this section, we develop a heuristic analysis of detectability 
and phase change. 

The detection property of Max follows from comparing the ranges of data. Recall that 
Xi = y/l — Xji- Zi + \/X^-Wi, the range of is roughly (—v'2 log n, ^/2\o^), and the range 

of {\/^ ■ Wi}2=i is ■ (—na,na) = (—na“ 2 ^na“ 2 ); so heuristically, 

Mn ~ max{Y^MogTi,na“ 2 }; 


for large n, notice that: 

na ~2 ^ -y/ 2logn, if r < ^, na“2 <C -\/2logn, if r > ^, 

thus if and only if r < M„ for the alternative will differ significantly from for the null, 
and so the criterion for detectability by Max is r < 

Now we study detection by excess kurtosis. Heuristically, 

Kn ~ (l/\/^) • k(a/1 - A„ • • '^i) = (1 /■ Vn- «(^) = 

thus if and only if r < 1 will Kn for the alternative differ significantly from under the null, 
and so the criterion for detectability by excess kurtosis is r < 1. 

This analysis shows the reason for the phase change. In Figure O when the parameter 
(a,r) is in the blue region, for sufficiently large n, na ~2 ^ -^21ogn and the strongest evidence 
against the null is in the tails of the data set, which Mn is indeed using. However, when (a, r) 
moves from the blue region to the yellow region, na ~2 \/21ogn, the tails no longer contain 
any important evidence against the null, instead, the central part of the data set contain the 
evidence. By symmetry, the and the 3'"'^ moments vanishes, and the 2”'^ moment is 1 by the 
normalization; so the excess kurtosis is in fact the most promising candidate of detectors based 
on moments. 

The heuristic analysis is the essence for theoretic proof as well as empirical experiment. Later 
in Section El we will have more discussions for comparing the excess kurtosis with Max down 
this vein. 
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Figure 4: Simulated observation containing the CMB and the CS (A = 0.18). 


size of 

4-th 

5-th 

6-th 

7-th 

8-th 

sub-sample 

moment 

moment 

moment 

moment 

moment 

n 

30.0826 

262.6756 

2.7390 X 10^ 

3.2494 X 10"^ 

4.2430 X 10^ 

n/2 

29.7100 

256.3815 

2.6219 X 10^ 

2.9697 X 10"^ 

3.7376 X 10^ 

n/2^ 

29.6708 

250.0520 

2.4333 X 10^ 

2.6237 X 10"^ 

3.0239 X 10^ 

n/2^ 

29.4082 

246.3888 

2.3158 X 10^ 

2.4013 X lO'^ 

2.3956 X 10^ 

n/2^ 

27.8039 

221.9756 

1.9615 X 10^ 

1.9239 X 10"^ 

1.8785 X 10^ 


Table 1: Empirical estimate 4-th, 5-th, 6-th, 7-th, and 8-th moments calculated using a subsam¬ 
ples of size n/2^ of with k = 0,1, 2,3,4. The table suggests that the 4-th, 5-th, and 

6-th moments are hnite, but the 7-th and 8-th moments are infinite. 


3 Wavelet Coefficients of Cosmic Strings 

3.1 Simulated Astrophysical Signals 

The temperature anisotropies of the CMB contain the contributions of both the primary cosmo¬ 
logical signal, directly related to the initial density perturbations, and the secondary anisotropies. 
The latter are generated after matter-radiation decoupling m They arise from the interaction 
of the CMB photons with the neutral or ionised matter along their path EHiEaiHn]- 

In the present study, we assume that the primary CMB anisotropies are dominated by the 
fluctuations generated in the simple single field inflationary Cold Dark Matter model with a 
non-zero cosmological constant. The CMB anisotropies have therefore a Gaussian distribution. 
We allow for a contribution to the primary signal from topological defects, namely cosmic strings 
(CS), as suggested in [Tn| . 

We use for our simulations the cosmological parameters obtained from the WMAP satellite 

and a normalization parameter cjg = 0.9. Finally, we obtain the so-called “simulated observed 
map”, D, that contains the two previous astrophysical components. It is obtained from Dx = 
\/l — ACMB -|- v/ACS, where CMB and CS are respectively the CMB and the cosmic string 
simulated maps. A = 0.18 is an upper limit constant derived by m- All the simulated maps 
have 500 x 500 pixels with a resolution of 1.5 arcminute per pixel. 


3.2 Evidence for E[VF^] = cxo 


For the wavelet coefficients on the finest scale of the cosmic string map in the right panel of 
Figure H by throwing away all the coefficients related to pixels on the edge of the map, we 
have n = 244^ coefficients; we then normalize these coefficients so that the empirical mean and 
standard deviation are 0 and 1 respectively; we denote the resulting dataset by 

Assuming are independent samples from a distribution VF, we have seen in Section 

2 that, whether excess kurtosis is better than Max depends on the finiteness of E[IF®]. We now 
analyze {wi}'^^^ to learn about E\W^]. 

Let 



i=l 


be the empirical 8-th moment of W using n samples. 


In theory, if E[IF®] < oo, then 














as n ^ oo. So one way to see if -E[Ty®] is finite is to observe how changes with n. 
Technically, since we only have n = 244^ samples, we can compare 

fc = 0,l,2,3,4; 

if these values are roughly the same, then there is strong evidence for ill [IT®] < oo; otherwise, if 

they increase with sample size, that is evidence for £1[1T®] = oo. Here ^ is an estimate of 

E\W^] using n/2^ sub-samples of 

For A: = 1, 2,3,4, to obtain rhg ' , we randomly draw sub-samples of size n/2^ from 

and then take the average of the 8-th power of this subsequence; we repeat this process 50,000 

(n/2^) 

times, and we let mg ' ' be the median of these 50,000 average values. Of course when k = 0, 

m.g”''^^ ^ is obtained from all n samples. 

The results correspond to the first wavelet band are summarized in Table ^ ^From the 
table, we have seen that is significantly larger than mg"'^*^ and mg”"^^^^; this supports that 
F1[1T®] = oo. Similar results were obtained from the other bands. In comparison, in Tabled we 
also list the 4-th, 5-th, 6-th, and 7-th moments. It seems that the 4-th, 5-th, and 6-th moments 
are finite, but the 7-th and 8-th moments are infinite. 


3.3 Power-law Tail of IT 


Typical models for heavy-tailed data include exponential tails and power-law tails. We now 
compare such models to the data on wavelet coefficients for IT; the Gaussian model is also 
included as comparison. 

We sort the jtCij’s in descending order, > |rc|( 2 ) > ... > |'fr|(n)) and take the 50 largest 

samples |rc|(i) > |rc|( 2 ) > ... > |'(i'|( 5 o)- Tor a power-law tail with index a, we expect that for 
some constant Ca, 

log(l) log(C'„) - alog(|w|(i)), 1 < z < 50, 

so there is a strong linear relationship between log(^) and log(|t(;|(j)). Similarly, for the ex¬ 
ponential model, we expect a strong linear relationship between log(^) and |zu|(j), and for the 
Gaussian model, we expect a strong linear relationship between log(^) and 

For each model, to measure whether the “linearity” is sufficient to explain the relationship 
between log(^) and log(|t(;|(j)) (or or we introduce the following z-score: 


Zi = y/n 


Pi-i/n 
i/n{l — i/n)\ ’ 


(3.8) 


where pi is the linear fit using each of the three models. If the resulting z-scores is random and 
have no specific trend, the model is appropriate; otherwise the model may need improvement. 

The results are summarized in Figure [51 The power-law tail model seems the most appro¬ 
priate: the relationship between log(^) and log(|t(;|(j)) looks very close to linear, the z-score 
looks very small, and the range of z-scores much narrower than the other two. For the expo¬ 
nential model, the linearity is fine at the first glance, however, the z-score is decreasing with i, 
which implies that the tail is heavier than estimated. The Gaussian model fits much worse than 
exponential. To summarize, there is strong evidence that the tail follows a power-law. 

Now we estimate the index a for the power-law tail. A widely-used method for estimating 
a is the Hills’ estimator m- 


HO 

= 


/ + 1 


ELi*iog( 


kip) ■ 

klp+p- 
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Figure 5: Left panel: from top to bottom, plots of log-probability log(i/n) versus log(|rt;|(j), 
and for 1 < i < 50, corresponding to the power-law/exponential/Gaussian mod¬ 

els we introduced in Section 13 w are the wavelet coefficients of the hnest scale (i.e. highest 
frequencies). Right panel: from top to bottom, normalized ^;-score as dehned in (jILSI) for the 
power-law/exponential/Gaussian models, where again for 1 <i < 50. 


Figure 6: Top row: left panel, from top to bottom, the fraction of detection for the excess 
kurtosis, HC* and Max, the x-axis is the corresponding A; right panel, from top to bottom, the 
fraction of detection for kurtosis, HC~^ and Max. Bottom row: left panel, for top to bottom, 
ROC curves for the excess kurtosis, HC*, and Max; right panel, ROC curves for the excess 
kurtosis, HC^, and Max. 


where I is the number of (the largest) to include for estimation. In our situation, I = 50 

and 

a = = 6.134; 

we also found that the standard deviation of this estimate ~ 0.9. Table El gives estimates of a 
for each band of the wavelet transform. This shows that a. is likely to be only slightly less than 
8: this means the performance of excess kurtosis and Max might be very close empirically. 


3.4 Comparison of Excess Kurtosis vs. Max with Simulation 

To test the results in Section Eia we now perform a small simulation experiment. A complete 
cycle includes the following steps, (n = 244^ and are the same as in Section ESI). 

1. Let A range from 0 to 0.1 with increment 0.0025. 

2. Draw [zi, 2 : 2 ,..., Zn) independently from iV(0,1) to represent the transform coefficients for 
CMB. 

3. For each A, let 

Xi = Xf ^ = \/l - \zi + \/\wi, A = 0,0.0025,... , 0.1 
represent the transform coefficients for CMB -|- CS. 

4. Apply detectors Kn, to the and obtain the p-values. 

We repeated the step 3-4 independently 500 times. 

Based on these simulations, hrst, we have estimated the probability of detection under various 
A, for each detector: 


Fraction of detections 


number of cycles with a p-value <0.05 
500 


Results are summarized in Figure 13 

Second, we pick out those simulated values for A = 0.05 alone, and plot the ROC curves for 
each detector. The ROC curve is a standard way to evaluate detectors IMl; the x-axis gives 
the fraction of false alarms (the fraction of detections when the null is true (i.e. A = 0)); the 
y-axis gives the corresponding fraction of true detections). Results are shown in Figure 13 The 
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Figure 7: The M-k plane and the curve k = ko{M), where M is the largest (absolute) observation 
of Wi's, and n is the empirical excess kurtosis of rcj’s, where rcj’s are the wavelet coefficients of 
the simulated cosmic string. Heuristically, if (M, k) falls above the curve, excess kurtosis will 
perform better than Max. The red star represent the points of (M, k) = (17.48, 27.08) for the 
current data set rcj’s, which is far above the curve. 


figure suggests that the excess kurtosis is slightly better than M„. We also show an adaptive 
test, HCn in two forms (HC* and 77(7+); these will be described later. 

We now interpret. As our analysis predicts that W has a power-law tail with i7[kF®] = oo, 
it is surprising that excess kurtosis still performs better than Max. 

In Section 1231 we compared excess kurtosis and Max in a heuristic way; here we will continue 
that discussion, using now empirical results. Notice that for the data set {wi,W 2 , ■ ■ ■ ,Wn), the 
largest (absolute) observation is: 

M = Mn = 17.48, 

and the excess kurtosis is: 

K = «:„ = - [V = 27.08. 

n 

i 

In the asymptotic analysis of Section [2.81 we assumed k(W) is a constant. However for n = 244^, 
we get a very large excess kurtosis 27.08 ~ nP'^; this will make excess kurtosis very favorable in 
the current situation. 

Now, in order for M„ to work successfully, we have to take A to be large enough that 

y/XM > -s/ 2 log n 


so A > 0.072. The p-value of is then: 


exp(—exp( 


\/AM - 4.2627 
0.2125 


)), 


moreover, the p-value for excess kurtosis is heuristically 


$ ^(-y/nA^K); 


setting them to be equal, we can solve k in terms of M: 

K = Kq{M). 

The curve k = kq{M) separates the M-k plane into 2 regions: the region above the curve is 
favorable to the excess kurtosis, and the region below the curve is favorable to Max. See Figure 
[3 In the current situation, the point (M, k) = (17.48, 27.08) falls far above the curve; this 
explains why excess kurtosis is better than Max for the current data set. 

3.5 Experiments on Wavelet Coefficients 
3.5.1 CMB + CS 

We study the relative sensitivity of the different wavelet-based statistical methods when the 
signals are added to a dominant Gaussian noise, i.e. the primary CMB. 

We ran 5000 simulations by adding the 100 CMB realisations to the CS {D{X, i) = \/l — ACMBj-|- 
^/XCS, i = 1...100), using 50 different values for A, ranging linearly between 0 and 0.18. Then 
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Multi-scale Method 

Alpha 

Bi-orthogonal Wavelet 


Scale 1, Horizontal 

6.13 

Scale 1, Vertical 

4.84 

Scale 1, Diagonal 

4.27 

Scale 2, Horizontal 

5.15 

Scale 2, Vertical 

4.19 

Scale 2, Diagonal 

3.83 

Scale 3, Horizontal 

4.94 

Scale 3, Vertical 

4.99 

Scale 3, Diagonal 

4.51 

Scale 4, Horizontal 

3.26 

Scale 4, Vertical 

3.37 

Scale 4, Diagonal 

3.76 


Table 2: Table of a values for which the different wavelet bands of the CS map. 


Figure 8: For the nine first bands of the wavelet transform, the mean p-value versus A. The 
solid, dashed and dotted lines correspond respectively to the excess kurtosis, the HC and Max. 


we applied the bi-orthogonal wavelet transform, using the standard 7/9 filter |1] to these 5000 
maps. On each band b of the wavelet transform, for each dataset D{X,i), we calculate the 
kurtosis value x In order to calibrate and compare the departures from a Gaussian 
distribution, we have simulated for each image D{X,i) a Gaussian Random Field G{X,i) which 
has the same power spectrum as D{X,i), and we derive its kurtosis values RG(fe,A,i)- ^ given 
band b and a given A, we derive for each each kurtosis its p-value pK{b,X,i) under 

the null hypothesis (i.e. no CS) using the distribution of Rg(6,a,*)- The mean p-value pK^b, A) 
is obtained by taking the mean of PK{b, A, *). For a given band b, the curve PK{b, A) versus A 
shows the sensitivity of the method to detect CS. Then we do the same operation, but replacing 
the kurtosis by HC and Max. Figure El shows the mean p-value versus A for the nine finest scale 
subbands of the wavelet transform. The first three subbands correspond to the finest scale (high 
frequencies) in the three directions, respectively horizontal, vertical and diagonal. Bands 4,5, 
and 6 correspond to the second resolution level and bands 7,8,9 to the third. Results are clearly 
in favor of the excess kurtosis. 

The same experiments have been repeated, but replacing the bi-orthogonal wavelet transform 
by the undecimated isotropic a trous wavelet transform. Results are similarly in favor of the 
excess kurtosis. Table 01 gives the A values (multiplied 100) for which the CS are detected at a 
95% confidence level. Only bands where this level is achieved are given. Smaller the A, better the 
the sensibility of the method to the detect the CS. These results show that the excess kurtosis 
outperforms clearly HC and Max, whatever the chosen multiscale transform and the analyzed 
scale. 

No method is able to detect the CS at a 95% conhdence level after the second scale in these 
simulations. In practice, the presence of noise makes the detection even more difficult, especially 
in the finest scales. 
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Multi-scale Method 

excess kurtosis 

HC 

MAX 

Bi-orthogonal Wavelet 

Scale 1, Horizontal 

0.73 

0.73 

0.73 

Scale 1, Vertical 

0.73 

0.73 

0.73 

Scale 1, Diagonal 

0.38 

0.38 

0.38 

Scale 2, Horizontal 

8.01 

9.18 

8.81 

Scale 2, Vertical 

6.98 

8.44 

10.65 

Scale 2, Diagonal 

2.20 

2.94 

2.57 

A trous Wavelet Transform 
Scale 1 

1.47 

1.47 

1.47 

Scale 2 

9.91 

12.85 

16.53 

Curvelet 

Scale 1, Band 1 

1.47 

2.20 

3.30 

Scale 1, Band 2 

13.59 

16.90 

- 

Scale 2, Band 1 

11.38 

14.32 

- 


Table 3: Table of A values (multiplied 100) for CS detections at 95% confidence. 


Multi-scale Method 

excess kurtosis 

HC 

MAX 

Bi-orthogonal Wavelet 

Scale 1, Horizontal 

0.30 

0.32 


Scale 1, Vertical 

0.32 

0.32 

- 

Scale 1, Diagonal 

0.06 

0.06 

0.24 

Scale 2, Horizontal 

- 

- 

- 

Scale 2, Vertical 

- 

- 

- 

Scale 2, Diagonal 

0.65 

0.71 

- 

A trous Wavelet Transform 
Scale 1 

0.41 

0.47 


Curvelet 

Scale 1, Band 1 

0.59 

0.69 

0.83 


Table 4: Table of A values for which the SZ detections at 95% confidence. 


3.5.2 CMB + SZ 

We now consider a totally different contamination. Here, we take into account the secondary 
anisotropies due to the kinetic Sunyaev-Zel’dovich (SZ) effect jlS]- The SZ effect represents the 
Compton scattering of CMB photons by the free electrons of the ionised and hot intra-cluster 
gas. When the galaxy cluster moves with respect to the CMB rest frame, the Doppler shift 
induces additional anisotropies; this is the so-called kinetic SZ (KSZ) effect. The kinetic SZ 
maps are simulated following Aghanim et al [21 and the simulated observed map D is obtained 
from D\ = CMB -|- AKSZ, where CMB and KSZ are respectively the CMB and the kinetic 
SZ simulated maps. We ran 5000 simulations by adding the 100 CMB realisations to the KSZ 
{D{X, i) = CMBj -|-AKSZ, i = 1... 100), using 50 different values for A, ranging linearly between 
0 and 1. The p-values are calculated just as in the previous section. 

Table jl] gives the A values for which SZ is detected at a 95% confidence level for the three 
multiscale transforms. Only bands were this level is achieved are given. Results are again in 
favor of the Kurtosis. 
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4 Curvelet Coefficients of Filaments 


Curvelet Analysis was proposed by Candes and Donoho (1999) jT2] as a means to efficiently 
represent edges in images; Donoho and Flesia (2001) [Ej showed that it could also be used to 
describe non-Gaussian statistics in natural images. It has also been used for a variety of image 
processing tasks: [TTn EHl EH. We now consider the use of curvelet analysis for detection of 
non-Gaussian cosmological structures which are filamentary. 

4.1 Curvelet Coefficients of Filaments 

Suppose we have an image I which contains within it a single filament, i.e. a smooth curve 
of appreciable length L. We analyse it using the curvelet frame. Applying analysis techniques 
described carefully in na, we can make precise the following claim: at scale s = 2 ^ there will 
be about 0{L2l^‘^) significant coefficients caused by this filamentary feature, and they will all 
be of roughly similar size. The remaining 0(4-^) coefficients at that scale will be much smaller, 
basically zero in comparison. 

The pattern continues in this way if there is a collection of m filaments of individual lengths 
Li and total length L = Li + ■ ■ ■ + Lm. Then we expect roughly 0{L2l/‘^) substantial coefficients 
at level j, out of 4-^ total. 

This suggests a rough model for the analysis of non-Gaussian random images which contain 
apparent ‘edgelike’ phenomena. If we identify the edges with filaments, then we expect to see, 
at a scale with n coefficients, about nonzero coefficients. Assuming all the edges are 

equally ‘pronounced’, this suggests that we view the curvelet coefficients of / at a given scale 
as consisting of a fraction e = nonzeros and the remainder zero. Under this model, the 

curvelet coefficients of a superposition of a Gaussian random image should behave like: 

W = (1 - e)iV(0,1) + 1) + 1), (4.9) 

where e are the fraction of large curvelet coefficients corresponding to filaments, and p is the 
amplitude of these coefficients of the non-Gaussian component N. 

The problem of detecting the existence of such a non-Gaussian mixture is equivalent to 
discriminating between the hypotheses: 

Ho-. W~A^(0,1), (4.10) 

i/f”) ■.Xi = {l- e„)iV(0, 1) + yiV(-//n, 1) + yiV(//n, 1), (4.11) 

and = 0 is equivalent to = 0. 


4.2 Optimal Detection of Sparse Mixtures 

When both e and p are known, the optimal test for Problem (10)1) - (EtTI) is simply the Neyman- 
Pearson Likelihood ratio test (LRT), |,‘12[ Page 74]. Asymptotic analysis shows the following, 

[211 EH]. 

Suppose we let = n~^ for some exponent (5 e (1/2,1), and 

Pn = \/2slog(n), 0 < s < 1. (4.12) 

There is a threshold effect: setting 


pm 


P-h 

{i-vrm?, 
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1/2 < (3 < 3/4, 
3/i<(5< 1. 


(4.13) 






Figure 9: The detection boundary separates the square in the f5-s plane into the detectable 
region and the undetectable region. When (/3, s) falls into the estimable region, it is possible 
not only to reliably detect the presence of the signals, but also estimate them. 


then as n —> oo, LRT is able to reliably detect for large n when s > p^iP), and is unable to 
detect when s < ISO]) HHi and m- Since LRT is optimal, it is not possible for any 

statistic to reliably detect when s < P2ic()- We call the curve s = p^iP) in the /3-s plane the 
detection boundary; see Figure El 

We also remark that if the sparsity parameter /3 < 1/2, it is possible to discriminate merely 
using the value of the empirical variance of the observations or some other simple moments, and 
so there is no need for advanced theoretical approaches. 


4.3 Adaptive Testing using Higher Criticism 

The Higher Criticism statistic (HC), was proposed in where it was proved to be asymptot¬ 
ically optimal in detecting (liTOl) - (lirTl) . 

To define HC first we convert the individual X^s into p-values for individual z-tests. Let 
Pi = P{N{0, 1) > Xi} be the p-value, and let denote the p-values sorted in increasing 
order, the Higher Criticism statistic is defined as: 


HCl = max 


Vn[i/n - P{i)]l^Jp{^){l-P(^)) 


or in a modified form; 


H CiP = max 

{i: l/n<p(i)<l-l/n} 


Vn[i/n - P(i)]/^Jp(i){l - P{i)) 


we let HCn refer either to HC* or HC^ whenever there is no confusion. The above definition 
is slightly different from d, but the ideas are essentially the same. 

With an appropriate normalization sequence: 

On = \/2 log log n, hn = 2 log log n -|- 0.5 log log log n — 0.51og(47r), 

the distribution of HCn converges to the Gumbel distribution E^, whose cdf is exp(—4exp(—x)), 

[321: ^ 

OnHCji bn Py ) 

SO the p-values of HCn are approximately: 


exp(-4exp(-[a„iLC'n - bn])). (4.14) 

For moderately large n, in general, the approximation in (imi) is accurate for the HCip, but not 
for HC*. For n = 244^, taking On = 2.2536 and = 3.9407 in (14.1411 gives a good approximation 
for the p-value of HCip. 

A brief remark comparing Max and HC. Max only takes into account the few largest obser¬ 
vations, HC takes into account those outliers, but also moderate large observations; as a result, 
in general, HC is better than Max, especially when we have unusually many moderately large 
observations. However, when the actual evidence lies in the middle of the distribution both HC 
and Max will be very weak. 
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Figure 10: From top to bottom: the image of the bar, the log-histogram of the curvelet coeffi¬ 
cients of the bar, and the qqplot of the curvelet coefficients vs. Normal. 


Figure 11: For the curvelet coefficients Uj’s of the simulated CS map in Figure El from top to 
bottom: the log-histogram of Uj’s, the qqplot of Uj’s vs. Normal, and the qqplot of sign(uj)|uj|°'®^^ 
vs. double exponential. 

4.4 Curvelet Coefficients of Cosmic Strings 

In Section El we studied wavelet coefficients of simulated cosmic strings. We now study the 
curvelet coefficients of the same simulated maps. 

We now discuss empirical properties of Curvelet coefficients of (simulated) cosmic strings. 
This was first deployed on a test image showing a simple ‘bar’ extending vertically across the 
image. The result, seen in Figure lTnl shows the image, the histogram of the curvelet coefficients at 
the next-to-finest scale, and the qq-plot against the normal distribution. The display matches in 
general terms the sparsity model of section El That display also shows the result of superposing 
Gaussian noise on the image; the curvelet coefficients clearly have the general appearance of a 
mixture of normals with sparse fractions at nonzero mean, just as in the model. 

We also applied the curvelet transform to the simulated cosmic string data. Figure fTTI shows 
the results, which suggest that the coefficients do not match the simple sparsity model. Extensive 
modelling efforts, not reported here, show that the curvelet coefficients transformed by |u|0-8i5 
have an exponential distribution. 

This discrepancy from the sparsity model has two explanations. First, cosmic string images 
contain (to the naked eye) both point-like features and curvelike features. Because curvelets are 
not specially adapted to sparsifying point-like features, the coefficients contain extra information 
not expressible by our geometric model. Second, cosmic string images might contain filamentary 
features at a range of length scales and a range of density contrasts. If those contrasts exhibit 
substantial amplitude variation, the simple mixture model must be replaced by something more 
complex. In any event, the curvelet coefficients of cosmic strings do not have the simple structure 
proposed in Section 

When applying various detectors of non-Gaussian behavior to curvelet coefficients, as in 
the simulation of Section ITfiL we find that, despite the theoretical ideas backing the use of 
HC as an optimal test for sparse non-Gaussian phenomena, the kurtosis consistently has better 
performance. The results are included in Tables El and E) 

Note that, although the curvelet coefficients are not as sensitive detectors as wavelets in this 
setting, that can be an advantage, since they are relatively immune to point-like features such 
as SZ contaimination. Hence they are more specific to CS as opposed to SZ effects. 

5 Conclusion 

The kurtosis of the wavelet coefficients is very often used in astronomy for the detection non- 
Gaussianities in the CMB. It has been shown EH that it is also possible to separate the non- 
Gaussian signatures associated with cosmic strings from those due to SZ effect by combining the 
excess kurtosis derived from these both the curvelet and the wavelet transform. We have studied 
in this paper several other transform-based statistics, the MAX and the Higher Criticism, and 
we have compared them theoretically and experimentally to the kurtosis. We have shown that 
kurtosis is asymptotically optimal in the class of weakly dependent symmetric non-Gaussian 
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contamination with finite 8-th moments, while HC and MAX are asymptotically optimal in the 
class of weakly dependent symmetric non-Gaussian contamination with infinite 8-th moment. 
Hence depending on the nature of the non-Gaussianity, a statitic is better than another one. 
This is a motivation for using several statistics rather than a single one, for analysing CMB 
data. Finally, we have studied in details the case of cosmic string contaminations on simulated 
maps. Our experiment results show clearly that kurtosis outperforms Max/HG. 
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